Dplyr II

Quantitative Methodology (UPF)

Jordi Mas Elias

https://www.jordimas.cat/

Summary

  • Summarize categoric variables
  • Summarize numeric variables
  • Summarize and group_by functions
  • Recoding variables

Warm up

R learning curve

RStudio workflow

Load packages.

library(dplyr)
library(tidyr)
library(ggplot2)
library(stringr)

Summarize categoric variables

Categoric variables

festivals$ambit
  [1] "Música"                      "Música"                     
  [3] "Lletres"                     "Arts visuals"               
  [5] "Arts escèniques"             "Música"                     
  [7] "Audiovisuals"                "Audiovisuals"               
  [9] "Multidisciplinaris i altres" "Arts visuals"               
 [11] "Lletres"                     "Audiovisuals"               
 [13] "Música"                      "Música"                     
 [15] "Arts visuals"                "Arts visuals"               
 [17] "Multidisciplinaris i altres" "Multidisciplinaris i altres"
 [19] "Música"                      "Arts visuals"               
 [21] "Música"                      "Multidisciplinaris i altres"
 [23] "Lletres"                     "Música"                     
 [25] "Audiovisuals"                "Multidisciplinaris i altres"
 [27] "Audiovisuals"                "Lletres"                    
 [29] "Multidisciplinaris i altres" "Música"                     
 [31] "Multidisciplinaris i altres" "Audiovisuals"               
 [33] "Multidisciplinaris i altres" "Música"                     
 [35] "Audiovisuals"                "Audiovisuals"               
 [37] "Audiovisuals"                "Audiovisuals"               
 [39] "Audiovisuals"                "Audiovisuals"               
 [41] "Audiovisuals"                "Arts escèniques"            
 [43] "Multidisciplinaris i altres" "Música"                     
 [45] "Multidisciplinaris i altres" "Lletres"                    
 [47] "Audiovisuals"                "Música"                     
 [49] "Multidisciplinaris i altres" "Audiovisuals"               
 [51] "Arts escèniques"             "Arts escèniques"            
 [53] "Arts visuals"                "Multidisciplinaris i altres"
 [55] "Música"                      "Audiovisuals"               
 [57] "Arts visuals"                "Música"                     
 [59] "Audiovisuals"                "Audiovisuals"               
 [61] "Música"                      "Música"                     
 [63] "Multidisciplinaris i altres" "Arts visuals"               
 [65] "Arts escèniques"             "Multidisciplinaris i altres"
 [67] "Lletres"                     "Lletres"                    
 [69] "Audiovisuals"                "Música"                     
 [71] "Música"                      "Multidisciplinaris i altres"
 [73] "Música"                      "Multidisciplinaris i altres"
 [75] "Arts escèniques"             "Música"                     
 [77] "Audiovisuals"                "Audiovisuals"               
 [79] "Multidisciplinaris i altres" "Música"                     
 [81] "Arts escèniques"             "Música"                     
 [83] "Música"                      "Audiovisuals"               
 [85] "Audiovisuals"                "Música"                     
 [87] "Audiovisuals"                "Audiovisuals"               
 [89] "Música"                      "Arts escèniques"            
 [91] "Música"                      "Música"                     
 [93] "Música"                      "Lletres"                    
 [95] "Música"                      "Audiovisuals"               
 [97] "Multidisciplinaris i altres" "Arts escèniques"            
 [99] "Arts escèniques"             "Audiovisuals"               
[101] "Arts escèniques"             "Música"                     
[103] "Multidisciplinaris i altres" "Audiovisuals"               
[105] "Arts escèniques"             "Lletres"                    
[107] "Lletres"                     "Audiovisuals"               
[109] "Multidisciplinaris i altres" "Música"                     
[111] "Música"                      "Música"                     
[113] "Multidisciplinaris i altres" "Audiovisuals"               
[115] "Multidisciplinaris i altres" "Arts visuals"               
[117] "Música"                      "Música"                     
[119] "Arts visuals"                "Lletres"                    
[121] "Multidisciplinaris i altres" "Audiovisuals"               
[123] "Arts escèniques"             "Música"                     
[125] "Audiovisuals"                "Audiovisuals"               
[127] "Música"                      "Lletres"                    
[129] "Multidisciplinaris i altres" "Música"                     
[131] "Lletres"                     "Música"                     
[133] "Lletres"                     "Multidisciplinaris i altres"
[135] "Arts visuals"                "Arts visuals"               
[137] "Arts escèniques"             "Música"                     
[139] "Lletres"                     "Audiovisuals"               
[141] "Audiovisuals"                "Música"                     
[143] "Audiovisuals"                "Audiovisuals"               
[145] "Multidisciplinaris i altres" "Arts escèniques"            
[147] "Música"                      "Multidisciplinaris i altres"
[149] "Lletres"                     "Audiovisuals"               
[151] "Audiovisuals"                "Audiovisuals"               
[153] "Audiovisuals"                "Audiovisuals"               
[155] "Audiovisuals"                "Multidisciplinaris i altres"
[157] "Multidisciplinaris i altres" "Audiovisuals"               
[159] "Lletres"                     "Multidisciplinaris i altres"
[161] "Música"                      "Multidisciplinaris i altres"
[163] "Música"                      "Música"                     
[165] "Audiovisuals"                "Música"                     
[167] "Multidisciplinaris i altres" "Audiovisuals"               
[169] "Audiovisuals"                "Música"                     
[171] "Audiovisuals"                "Audiovisuals"               
[173] "Multidisciplinaris i altres" "Multidisciplinaris i altres"
[175] "Multidisciplinaris i altres" "Lletres"                    
[177] "Arts escèniques"             "Arts escèniques"            
[179] "Arts escèniques"             "Audiovisuals"               
[181] "Arts escèniques"             "Música"                     
[183] "Arts escèniques"             "Audiovisuals"               
[185] "Música"                      "Audiovisuals"               
[187] "Música"                      "Música"                     
[189] "Lletres"                     "Audiovisuals"               
[191] "Lletres"                     "Música"                     
[193] "Arts escèniques"             "Música"                     
[195] "Música"                      "Audiovisuals"               
[197] "Arts visuals"                "Multidisciplinaris i altres"
[199] "Audiovisuals"                "Música"                     
[201] "Multidisciplinaris i altres" "Música"                     
[203] "Música"                      "Multidisciplinaris i altres"
[205] "Música"                      "Música"                     
[207] "Audiovisuals"               

Categoric variables

Mode: Most repeated value.

  • Count: count(d, v, sort = T)
count(festivals, ambit, sort = T)
# A tibble: 6 × 2
  ambit                           n
  <chr>                       <int>
1 Música                         59
2 Audiovisuals                   56
3 Multidisciplinaris i altres    38
4 Arts escèniques                21
5 Lletres                        20
6 Arts visuals                   13
  • Frequency table: table(df$v)
table(festivals$ambit)

            Arts escèniques                Arts visuals 
                         21                          13 
               Audiovisuals                     Lletres 
                         56                          20 
Multidisciplinaris i altres                      Música 
                         38                          59 
  • Percentage FT: prop.table(table(df$v)) * 100
prop.table(table(festivals$ambit)) * 100

            Arts escèniques                Arts visuals 
                  10.144928                    6.280193 
               Audiovisuals                     Lletres 
                  27.053140                    9.661836 
Multidisciplinaris i altres                      Música 
                  18.357488                   28.502415 
  • Bar plot: barplot(table(df$v))
barplot(table(festivals$ambit))

  • En %: barplot(prop.table(table(df$v)))
barplot(prop.table(table(festivals$ambit)))

Summarize numeric variables

Numeric variables

Central value: Where is the center?

lloguer_any$preu
   [1]  589.55  712.79  540.71  673.44  736.09  673.37  921.40  827.87  716.13
  [10]  693.43  568.00      NA  553.55  631.50  580.71  604.74  584.27  605.28
  [19]  777.23  700.16 1230.00  837.33 1042.16 1215.85  927.63 1053.70  750.84
  [28]  679.66  583.93  634.48  679.80  671.00  622.63  568.94  586.50  585.45
  [37]  478.70  492.79  557.00  537.94  737.73      NA  556.30  555.89  523.29
  [46]  475.06  241.99  613.58  563.33  461.28  503.47  521.81  442.94      NA
  [55]  359.27      NA  461.10      NA  488.14  581.16  626.26  563.42  613.47
  [64]  580.09  604.85  704.67  926.43  663.93  866.70  516.90  635.70  643.66
  [73]  568.68  550.63  728.79  541.93  676.98  706.16  663.23  948.96  821.81
  [82]  708.00  718.01  578.65      NA  454.44  624.88  584.91  616.40  569.68
  [91]  618.80  765.29  748.64 1459.12 1051.22 1030.35 1327.22  920.62 1006.56
 [100]  761.35  648.32  551.40  702.33  673.79  665.44  610.87  525.04  564.10
 [109]  772.41  495.01  476.94  568.75  535.88  675.63      NA  566.67  573.24
 [118]  517.06  461.25  295.39  613.14  474.29  448.60  465.56  501.40  449.31
 [127]  320.21  406.17      NA  407.26  231.40  476.31  573.12  622.71  556.26
 [136]  590.17  592.63  641.11  678.23  988.89  644.01  817.93  480.69  643.46
 [145]  641.75  569.14  576.45  790.74  578.88  699.10  719.84  697.69  934.55
 [154]  869.72  746.78  713.89  580.89      NA  542.81  630.89  605.58  603.16
 [163]  629.58  624.78  786.29  768.46 1506.20  988.66 1081.14 1386.13  963.57
 [172] 1078.84  766.33  642.66  537.15  684.28  691.69  711.14  613.17  536.04
 [181]  585.39  703.15  498.79  469.77  542.51  578.50  715.00      NA  562.86
 [190]  567.26  522.70  459.42  145.45  607.74  532.67  444.29  484.98  479.48
 [199]  416.65      NA  372.44      NA  419.12      NA  503.93  593.22  570.36
 [208]  596.63  635.11  604.83  607.62  739.94 1121.06  652.23  896.42  423.45
 [217]  630.29  614.55  569.84  597.00  761.68  593.73  687.33  735.99  687.26
 [226]  975.96  848.56  753.61  720.14  602.55      NA  574.45  679.58  603.86
 [235]  621.68  602.28  647.14  802.23  806.43 1762.06 1754.44 1004.37 1352.08
 [244]  948.26 1047.50  761.70  691.94  552.75  655.87  681.25  681.26  638.55
 [253]  545.11  590.31  784.44  506.26  506.77  550.45  544.47  672.22      NA
 [262]  585.58  579.11  552.56  501.06      NA  620.83  566.36  450.40  502.41
 [271]  515.99  420.98      NA  360.94      NA  400.76      NA  514.58  557.10
 [280]  616.27  607.77  647.84  615.61  643.75  634.12  991.06  658.01 1130.29
 [289]  504.19  649.21  636.30  552.58  601.31  756.68  581.34  697.16  753.76
 [298]  679.03  957.55  859.64  766.18  740.41  578.76      NA  492.49  728.25
 [307]  614.15  621.71  606.48  616.05  822.36  727.99 1250.83 1335.71  917.13
 [316] 1335.63  954.00 1088.66  758.44  777.25  576.44  673.20  707.17  701.69
 [325]  653.16  521.06  601.06  733.81  485.74  505.02  543.33  608.87  740.00
 [334]      NA  585.54  561.58  545.86  455.96      NA  626.23      NA  461.89
 [343]  469.83  515.49  433.81      NA  401.17      NA  383.62  303.80  504.27
 [352]  585.98  619.16  595.43  665.89  615.46  618.57  722.23 1093.14  682.84
 [361]  914.75  512.89  612.11  662.25  609.22  598.91  771.13  598.16  708.03
 [370]  770.13  704.44  988.89  906.14  766.85  746.00  598.47  381.07  536.18
 [379]  687.33  601.60  607.68  606.78  627.86  817.24  679.54 1711.45  824.38
 [388] 1101.15 1337.79 1056.07 1095.70  790.21  712.75  539.58  637.81  726.61
 [397]  712.64  631.05  539.56  596.15  749.23  504.61  547.25  558.49  621.76
 [406]  701.43      NA  568.31  583.05  556.50  485.40      NA  562.45  575.83
 [415]  466.89  419.79  531.27  456.71  420.63  399.19  450.00  464.95  362.94
 [424]  526.12  609.55  608.62  594.29  650.76  624.24  636.97  755.44 1116.30
 [433]  711.46  981.86  505.97  670.38  697.32  595.42  654.00  773.00  630.00
 [442]  729.00  753.00  739.00 1052.00  926.00  807.00  798.00  631.00      NA
 [451]  565.00  735.00  632.00  661.00  652.00  701.00  860.00  814.00 2034.00
 [460] 1119.00 1156.00 1462.00 1073.00 1262.00  840.00  713.00  576.00  743.00
 [469]  742.00  752.00  636.00  586.00  620.00  611.00  528.00  498.00  554.00
 [478]  626.00  655.00  368.00  572.00  581.00  542.00  509.00      NA  555.00
 [487]  558.00  476.00  506.00  544.00  466.00      NA  381.00      NA  449.00
 [496]  187.00  577.00  631.00  637.00  636.00  685.00  669.00  673.00  774.00
 [505] 1173.00  764.00 1023.00  516.00  679.00  672.00  633.00  644.73  831.13
 [514]  601.38  751.60  792.53  726.93 1035.07  941.58  789.76  772.11  638.47
 [523]  411.36  566.01  648.84  640.90  657.94  635.26  658.42  909.27  764.77
 [532] 1860.30  841.10 1227.35 1385.99 1076.80 1153.93  808.38  745.31  542.71
 [541]  751.72  749.24  735.76  649.09  550.49  630.99  775.29  528.77  527.53
 [550]  545.82  654.09  689.78      NA  594.19  615.06  577.52  555.18      NA
 [559]  538.10  598.04  448.63  503.35  528.70  438.91  366.67  400.29      NA
 [568]  455.88      NA  539.29  622.94  682.44  618.87  688.36  660.72  672.11
 [577]  737.31 1015.10  773.52 1211.01  492.10  685.35  666.84  602.19  624.42
 [586]  893.42  638.75  759.16  800.08  752.37 1104.26  947.30  811.62  805.89
 [595]  622.24  360.88  447.57  638.73  662.28  645.15  629.69  664.66  892.70
 [604]  777.86 1572.56  933.95 1195.68 1380.68 1116.51 1187.57  827.82  741.42
 [613]  590.53  684.47  764.02  802.69  696.69  595.92  646.76  749.81  518.24
 [622]  513.53  616.36  614.79  692.35      NA  601.35  618.55  536.66  537.93
 [631]      NA  589.20  619.85  455.65  539.59  518.02  423.18  392.44  344.26
 [640]  418.00  422.37  142.34  472.39  624.75  658.12  613.71  665.34  659.58
 [649]  645.42  799.63 1136.00  719.12 1004.51  506.70  758.81  688.49  597.89
 [658]  662.28  806.24  659.35  773.62  808.08  739.86 1120.40  960.80  842.53
 [667]  793.64  651.21      NA  548.51  751.69  673.24  672.13  702.86  681.39
 [676]  912.53  821.20 1660.43 1117.72 1197.62 1560.70 1096.73 1189.10  891.63
 [685]  755.85  592.76  772.96  773.66  780.36  642.91  632.00  634.75  689.66
 [694]  550.30  544.56  592.36  635.14  786.32      NA  620.05  616.99  574.38
 [703]  547.65      NA  613.96  621.43  475.41  526.19  550.31  450.85  447.05
 [712]  427.97      NA  483.22      NA  588.23  663.31  619.99  635.81  704.45
 [721]  690.84  688.85  759.08 1151.48  785.11 1116.09  521.08  727.81  679.25
 [730]  613.21  666.17  851.63  686.63  839.58  868.35  799.38 1143.98  986.21
 [739]  890.54  883.32  696.74      NA  577.47  764.38  702.72  674.88  710.45
 [748]  740.50  949.97  892.28 1689.60 1302.14 1172.86 1619.04 1083.53 1278.66
 [757]  876.77  830.24  635.15  855.89  827.81  805.87  718.50  645.66  679.39
 [766]  675.69  557.55  570.10  603.75  646.43  796.92      NA  634.35  640.48
 [775]  617.22  538.10      NA  632.29  648.75  512.34  495.52  584.53  419.48
 [784]    0.00  373.14      NA  459.84  256.45  558.42  658.49  670.19  691.95
 [793]  712.32  715.52  708.80  839.08 1142.76  828.61 1268.48  556.45  856.58
 [802]  716.17  617.01  716.70  913.96  686.31  854.18  816.51  806.40 1174.38
 [811] 1012.22  926.21  882.47  706.11      NA  613.16  786.35  703.28  730.45
 [820]  711.10  740.28  983.48  866.43 1692.20 1354.62 1185.85 1490.72 1197.72
 [829] 1298.95  904.17  818.69  653.69  784.91  829.23  823.72  747.31  658.74
 [838]  705.71  761.28  597.13  672.65  623.60  668.46  778.67      NA  668.66
 [847]  684.20  672.64  582.12  301.82  666.83  617.52  523.02  565.89  589.24
 [856]  438.65  416.39  446.13      NA  552.81      NA  555.12  690.64  709.11
 [865]  662.25  768.66  720.39  728.87  886.66 1285.61  840.76 1181.15  571.60
 [874]  987.18  720.74  591.86  734.99  905.26  722.78  895.28  871.08  847.04
 [883] 1151.09 1001.49  909.08  876.97  715.71      NA  689.16  806.34  680.17
 [892]  739.35  697.58  737.27  936.32  896.45 1856.57 1248.00 1291.50 1516.52
 [901] 1182.12 1268.61  913.45  946.59  687.42  841.38  837.09  839.29  743.88
 [910]  637.30  722.67  812.93  597.09  663.90  622.33  716.43  816.05      NA
 [919]  669.89  687.37  654.58  587.02      NA  634.54  689.62  504.31  595.48
 [928]  624.58  540.68  418.07  459.19      NA  497.84      NA  622.82  708.60
 [937]  575.23  696.04  716.78  756.57  774.97  850.73 1108.37  868.90 1148.20
 [946]  588.90  999.62  742.06  677.91  714.29  968.16  738.25  930.13  884.99
 [955]  867.97 1218.18 1044.28  949.96  887.78  756.80      NA  634.28  785.02
 [964]  749.61  760.98  766.05  780.27 1006.84  882.05 1666.63 1101.67 1423.86
 [973] 1646.76 1236.21 1287.23  902.42  917.84  701.29  855.70  852.57  872.41
 [982]  756.13  710.18  722.28  721.47  597.58  671.85  680.80  677.09  819.11
 [991]      NA  708.69  699.81  640.95  600.84  407.22  632.43  677.50  505.97
[1000]  579.67  617.61  533.30      NA  436.17  302.52  434.91      NA  647.69
[1009]  728.09  656.71  724.28  783.88  765.74  740.91  907.14 1345.63  899.89
[1018]  965.36  606.61  941.16  750.71  671.87  764.85 1028.49  758.18  927.16
[1027]  933.54  906.78 1179.11 1098.21  950.15  922.46  759.87      NA  711.93
[1036]  817.64  777.64  861.69  796.50  812.48 1013.57  970.71 1882.42 1360.42
[1045] 1364.01 1757.00 1324.40 1377.82  962.41  912.51  799.44  804.26  867.94
[1054]  880.00  804.26  698.03  739.68  922.75  610.13  643.74  713.38  749.00
[1063]  805.87      NA  704.57  750.43  670.43  613.80      NA  625.23  701.50
[1072]  560.56  593.64  625.66  538.45  463.17  423.09      NA  532.33      NA
[1081]  653.72  744.73  692.49  761.19  790.19  797.70  775.27  905.24 1189.26
[1090]  917.87 1093.34  625.13  933.90  776.09  711.99  776.73  993.16  766.03
[1099]  914.78  910.78  877.02 1225.08 1081.29  957.00  905.83  760.23      NA
[1108]  728.07  827.36  771.78  786.53  772.86  806.56 1043.76  935.38 1737.92
[1117] 1213.95 1334.77 1588.53 1198.24 1315.80  938.20  875.53  711.57  741.29
[1126]  879.85  862.51  773.02  693.65  759.06  822.82  609.11  616.32  672.70
[1135]  754.69  865.42      NA  708.88  723.34  682.83  653.52      NA  693.40
[1144]  656.15  565.18  600.37  641.21  551.22  448.23  425.51      NA  535.92
[1153]      NA  576.97  751.95  732.23  742.89  805.10  789.59  777.21  916.33
[1162] 1349.82  911.12 1233.01  621.32  864.09  790.58  692.67  792.74  998.40
[1171]  870.84  923.44  910.53  883.75 1191.43 1069.16  947.92  899.98  783.50
[1180]      NA  751.65  842.32  754.83  785.79  764.41  802.64 1002.34  948.77
[1189] 1799.91  895.70 1249.23 1613.41 1247.70 1310.42  946.65  886.27  741.23
[1198]  815.62  899.65  884.55  832.63  721.46  736.80  824.12  634.75  690.81
[1207]  648.10  717.78  874.20      NA  748.19  720.90  675.08  665.36      NA
[1216]  685.05  714.68  605.41  624.71  634.30  592.73      NA  507.16      NA
[1225]  589.02      NA  658.63  744.82  734.45  754.09  790.20  785.88  774.32
[1234]  916.31 1272.26  913.29 1164.84  682.32  957.02  751.48  715.37  820.88
[1243] 1024.29  923.45  991.39  942.96  893.43 1263.08 1078.13  975.53  907.42
[1252]  788.23      NA  719.33  834.68  801.37  757.26  755.29  811.74 1038.73
[1261]  959.26 1563.31  860.13 1332.33 1467.07 1288.82 1328.14  972.89  939.03
[1270]  716.25  883.52  913.34  885.40  817.38  737.38  758.61  756.72  637.17
[1279]  682.44  646.83  716.18  864.72      NA  742.51  726.89  713.94  668.16
[1288]      NA  724.25  661.88  575.71  628.80  657.43  574.15  434.52  491.84
[1297]      NA  598.29      NA  687.55  752.66  798.39  751.16  797.80  806.97
[1306]  815.82 1198.77 1288.91  960.99 1371.14  649.17  913.66  797.54  719.55
[1315]  849.32 1012.49  889.24  970.97  968.85  934.89 1216.70 1136.21 1003.89
[1324]  970.26  822.04      NA  755.18  871.41  814.55  830.32  786.80  856.57
[1333] 1077.99 1013.28 1849.52 1102.29 1443.71 1746.01 1322.61 1374.32 1021.27
[1342]  905.82  756.36  870.31  923.39  924.81  839.47  766.82  797.84  876.58
[1351]  654.77  705.60  660.93  763.82  822.31      NA  754.53  742.82  712.12
[1360]  682.67      NA  732.86  715.45  638.14  618.91  667.72  584.73      NA
[1369]  514.34      NA  597.93      NA  693.69  777.06  783.09  787.78  823.74
[1378]  827.45  815.10  897.67 1284.30 1006.81  818.55  699.06  928.02  803.49
[1387]  767.10  827.27 1057.26  913.20 1000.52  983.10  930.02 1340.38 1164.32
[1396] 1017.56  960.30  813.74      NA  778.52  810.24  811.00  825.36  831.19
[1405]  834.64 1031.46  960.03 1615.31  898.33 1422.80 1567.91 1283.24 1374.82
[1414] 1031.54  971.51  748.90  930.09  927.06  945.81  827.04  727.05  796.35
[1423]  840.52  653.21  721.23  666.91  750.95  876.65      NA  745.35  782.95
[1432]  723.34  692.05      NA  708.41  774.38  595.81  649.22  698.08  575.38
[1441]      NA  514.52      NA  564.56  673.69  696.06  782.15  798.32  804.74
[1450]  840.01  851.58  835.55  938.59 1292.23  970.64 1307.79  711.67  999.63
[1459]  823.46  756.10  822.60 1060.90  891.30  990.20  957.20  936.30 1269.10
[1468] 1123.00 1008.50  986.20  815.80      NA  714.90  857.60  810.00  844.10
[1477]  823.50  873.00 1070.00  938.50 1732.50 1103.50 1297.70 1645.90 1226.40
[1486] 1357.90 1057.10  907.50  750.60  883.40  933.40  937.30  859.10  791.20
[1495]  789.80  796.50  670.60  704.40  759.20  780.40  870.20      NA  777.00
[1504]  758.50  722.80  685.30      NA  751.00  791.90  636.10  633.20  677.40
[1513]  612.30      NA  575.40      NA  601.20      NA  705.80  795.30  807.60
[1522]  772.30  855.30  845.30  828.80  950.80 1304.00  952.50 1231.20  694.80
[1531]  959.20  840.20  764.60  867.00 1112.10  924.00 1010.10  988.70  942.90
[1540] 1277.90 1139.80 1032.90  985.20  823.40      NA  753.10  854.20  824.20
[1549]  859.50  858.40  867.50 1085.10  973.00 1746.70 1121.60 1334.20 1602.50
[1558] 1310.00 1360.30 1115.70 1028.90  819.80  899.20  951.20  978.30  866.80
[1567]  764.50  825.00 1044.00  666.90  725.90  705.10  802.40  796.60      NA
[1576]  760.60  773.70  775.30  702.40      NA  723.10  756.40  625.30  653.30
[1585]  673.70  640.20      NA  559.80      NA  614.70      NA  659.20  800.90
[1594]  807.50  782.70  836.90  844.30  829.40 1066.60 1444.60  996.70 1388.10
[1603]  844.50  960.10  844.10  798.10  869.10 1082.40  871.60 1020.20 1031.30
[1612]  979.60 1348.90 1194.40 1081.40  987.20  865.80      NA  790.10  881.50
[1621]  834.60  832.90  888.30  884.30 1113.90 1040.30 1950.10 1355.50 1449.50
[1630] 1614.80 1317.00 1412.20 1070.20  953.70  813.80  926.40  991.50  996.30
[1639]  902.20  762.50  819.70  880.20  703.30  773.70  756.00  792.40  830.00
[1648]      NA  788.60  792.20  749.20  690.00      NA  721.10  714.40  612.30
[1657]  696.10  696.30  617.90      NA  532.60      NA  569.90      NA  654.30
[1666]  805.00  827.10  805.90  864.60  879.50  899.10  959.60 1332.40 1024.30
[1675] 1409.40  728.30  996.40  829.70  770.20  844.90 1139.70  893.00  966.40
[1684] 1064.40  980.00 1333.00 1187.60 1083.70 1018.40  846.20  350.50  816.70
[1693]  868.80  854.40  870.00  838.00  884.80 1116.20 1078.10 2023.40 1220.00
[1702] 1398.40 1745.00 1361.60 1455.90 1099.80  987.00  817.50  954.40  975.70
[1711] 1017.60  897.70  755.90  843.10  850.90  700.20  747.30  740.20  798.50
[1720]  954.50      NA  802.50  801.80  756.80  710.70      NA  783.00  722.50
[1729]  610.50  668.70  692.50  561.40      NA  572.90      NA  622.30      NA
[1738]  713.40  819.30  829.10  816.70  856.80  857.90  871.60 1072.80 1357.80
[1747] 1051.30 1498.60  693.80 1013.60  835.70  781.00  864.50 1110.20  899.50
[1756] 1030.00 1059.90  952.60 1349.60 1150.80 1043.80 1010.50  837.00      NA
[1765]  810.10  882.40  865.00  870.00  858.30  891.40 1069.90  989.60 1626.90
[1774] 1191.30 1417.70 1662.40 1311.50 1388.90 1053.90  966.40  814.80  939.70
[1783]  963.60  967.90  920.80  816.40  841.00  928.30  707.90  741.30  698.90
[1792]  788.10  963.40      NA  815.70  789.90  690.40  686.40      NA  735.40
[1801]  778.00  636.80  675.40  714.80  565.80  452.00  546.10      NA  605.00
[1810]  584.10  725.80  798.90  777.20  815.50  873.00  887.30  865.80  969.10
[1819] 1427.50 1037.00 1233.20  749.70 1000.10  863.00  765.30  853.30 1041.00
[1828]  856.50  919.80  999.90  943.50 1249.40 1159.00 1029.50  986.10  856.20
[1837]      NA  687.80  825.50  867.80  893.60  849.50  874.30 1096.60  946.10
[1846] 1822.70 1078.10 1432.20 1679.10 1272.10 1304.40 1073.30 1060.50  910.30
[1855]  998.50  950.90  968.70  916.10  739.30  860.40  997.30  682.80  695.60
[1864]  791.30  758.90  879.20      NA  767.60  803.20  725.90  721.30      NA
[1873]  686.60      NA  633.80  603.90  707.80  623.80      NA  508.40      NA
[1882]  701.00      NA  759.50  835.30  809.50  788.20  898.50  853.80  890.90
[1891]  975.70 1446.10  987.10  840.90  696.90 1076.60  854.30  767.50  833.90
[1900]  982.50  823.10  970.90  997.40  973.00 1273.90 1192.30 1067.40  959.30
[1909]  820.20      NA  828.20  873.90  842.10  892.30  858.20  883.10 1058.10
[1918] 1012.40 1932.30 1192.80 1314.40 1699.40 1321.90 1427.90 1073.40  909.90
[1927]  777.60 1001.20  956.50  958.20  901.00  848.50  825.80  926.50  706.00
[1936]  748.50  787.40  770.20  830.00      NA  788.00  842.00  759.30  712.80
[1945]      NA  771.40  833.30  627.00  692.20  712.10  583.80      NA  572.40
[1954]      NA  668.50      NA  738.00  825.90  795.00  834.20  858.40  894.20
[1963]  869.10 1006.30 1346.70 1006.90 1074.40  801.50  956.80  822.40  782.40
[1972]  787.00 1003.90  875.00  888.90  959.50  902.10 1208.80 1126.00 1001.40
[1981]  940.70  810.20      NA  762.70  834.30  842.90  817.40  823.20  850.30
[1990] 1037.90  993.30 1577.10 1390.00 1338.40 1729.50 1286.00 1307.10 1033.30
[1999] 1026.80  790.40  941.50  943.20  928.80  845.60  716.30  786.50  856.40
[2008]  651.40  679.00  671.70  798.30  817.10      NA  747.50  751.80  723.60
[2017]  681.30      NA  750.20  756.10  651.30  644.40  673.80  600.20      NA
[2026]  514.10      NA  610.00      NA  688.90  794.00  766.60  779.40  846.30
[2035]  817.70  870.80  952.20 1332.40  974.70 1392.60  731.60  859.90  807.40
[2044]  765.90  917.00 1181.60  999.80 1022.80  957.90  984.30 1294.80 1105.70
[2053] 1009.30  999.90  846.20      NA  728.70  823.90  835.70  971.80  805.40
[2062]  849.40  995.00 1018.60 1649.80 1102.20 1362.60 1630.80 1195.00 1341.50
[2071] 1054.90  947.80  809.90  874.30  986.00  957.90  834.70  737.60  801.70
[2080]  784.70  671.30  723.90  706.40  736.10  888.90      NA  740.30  767.00
[2089]  718.50  662.80      NA  644.70  655.00  641.20  652.70  687.20  591.00
[2098]      NA  517.70      NA  616.90      NA  702.60  800.90  751.50  795.80
[2107]  847.20  841.50  861.20 1089.20 1305.00  967.90  964.70  796.90  939.20
[2116]  796.40  733.90  872.00 1062.50  900.50 1051.40 1106.90  985.90 1328.60
[2125] 1174.60 1012.20 1004.30  879.30 1407.90  638.80  912.10  826.20 1025.00
[2134]  848.50  885.60 1067.80  993.10 1944.50  894.90 1349.30 1767.00 1369.10
[2143] 1484.50 1108.60  969.30  759.20 1010.30  982.60  977.80  903.90  812.40
[2152]  810.00  840.10  725.80  786.50  720.30  750.70  865.30      NA  794.20
[2161]  855.50  728.50  709.80      NA  748.80  851.30  632.90  609.00  693.30
[2170]  574.90      NA  577.30      NA  620.00      NA  732.60  788.20  803.50
[2179]  784.30  878.40  875.10  884.00 1173.20 1442.60 1085.60 1238.70  779.20
[2188] 1032.90  843.00  757.00  928.10 1111.30  922.90 1084.90 1118.70 1038.70
[2197] 1445.60 1252.20 1113.90 1072.20  888.30 1380.00  785.90  971.90  910.50
[2206] 1136.70  930.10  940.30 1203.40 1107.20 1946.10 1364.70 1541.40 1990.20
[2215] 1473.10 1550.10 1149.90 1073.60  839.90  997.50 1043.40 1032.50  948.70
[2224]  831.90  872.60  880.30  753.20  810.50  759.00  853.50  907.30      NA
[2233]  814.10  838.60  765.70  760.50      NA  816.00      NA  676.10  660.10
[2242]  731.30  661.40  552.70  575.30      NA  668.00      NA  776.80  855.60
[2251]  850.70  869.10  890.00  916.30  936.40 1169.30 1499.50 1104.20 1354.90
[2260]  761.40 1059.40 1158.90  775.80  759.20  929.60  808.90  879.40  918.00
[2269]  887.10 1176.70 1040.20  964.00  898.70  781.60      NA  810.60  799.50
[2278]  778.40  807.60  788.10  829.10  990.30  927.30 1598.20 1531.70 1263.90
[2287] 1578.00 1264.80 1294.90  990.10  959.00  745.40  859.10  910.60  865.30
[2296]  817.30  741.10  791.70  795.00  668.80  716.10  700.50  741.40  827.00
[2305]      NA  747.40  736.50  707.20  687.20      NA  691.60  762.50  643.60
[2314]  631.60  671.80  605.00      NA  520.20      NA  625.10      NA  658.90
[2323]  762.50  759.30  756.90  824.20  814.50  853.60  927.50 1298.90  928.50
[2332]  978.40  704.20  976.70  794.60  740.80  774.20  934.10  814.80  862.10
[2341]  907.10  887.40 1196.30 1070.10  953.30  928.80  773.70      NA  772.40
[2350]  810.80  772.80  815.40  778.10  815.40  998.80  904.80 1680.30 1074.70
[2359] 1298.00 1429.00 1226.10 1339.10 1024.40  924.50  796.80  859.20  881.00
[2368]  880.10  820.20  773.50  818.20  817.70  667.60  713.80  653.50  729.90
[2377]  884.20      NA  734.90  754.40  698.50  671.80      NA  698.30  673.80
[2386]  608.70  648.90  680.90  598.60      NA  488.10      NA  592.40  447.50
[2395]  741.90  757.20  737.40  791.30  827.20  795.60  786.40  911.10 1248.60
[2404]  912.40  948.30  700.80  884.80  799.70  728.20  770.50  968.40  826.00
[2413]  889.10  970.20  912.40 1168.40 1072.90  968.60  934.30  787.00      NA
[2422]  698.50  858.70  804.60  832.50  805.90  836.70 1017.00  982.90 1816.50
[2431] 1009.50 1345.20 1566.50 1159.60 1352.50 1004.70  961.00  755.50  934.60
[2440]  891.60  899.70  836.30  753.70  823.60  817.30  679.30  686.30  785.60
[2449]  745.90  878.10      NA  754.90  747.50  712.80  681.80      NA  653.20
[2458]  769.90  628.70  666.60  669.50  610.40      NA  547.70      NA  585.00
[2467]      NA  644.10  792.90  772.80  751.90  805.30  830.80  822.20 1024.40
[2476] 1343.10  971.30 1162.90  736.10  962.70  804.20  752.90  772.80  953.00
[2485]  877.90  922.80  976.90  924.40 1220.80 1114.90  965.90  955.70  777.90
[2494]      NA  691.90  827.20  809.50  872.50  800.00  833.70 1052.80  920.60
[2503] 1679.30 1214.50 1292.90 1597.40 1219.60 1433.70  998.30  920.10  754.50
[2512]  899.90  920.90  926.50  821.00  757.70  802.60  835.30  683.80  739.70
[2521]  675.70  744.20  859.10      NA  760.40  803.70  740.40  691.90      NA
[2530]  664.20  763.30  605.50  647.70  683.90  557.00      NA  546.80      NA
[2539]  626.60  616.70  696.80  754.00  798.60  799.20  825.40  842.10  833.80
[2548] 1040.70 1364.70  970.70 1205.30  700.50  977.80  826.20  752.00

Numeric variables

Central value: Where is the center? median(), mean().

Numeric variables

Central value: Where is the center?

Numeric variables

Central value: Where is the center?

Numeric variables

Central value: Mode? hist(df$v).

Numeric variables

Dispersion: How dispersed are the numbers?

  • Min: min()
  • Max: max()
  • Range: diff(range())
  • Quantiles: quantile()
  • InterQuartilic Range: IQR()

Numeric variables

Dispersion: How dispersed are the numbers?

Min: min()

min(cens_gc$periode_desaparicio_1, na.rm = T)
[1] 1929

Max: max()

max(cens_gc$periode_desaparicio_1, na.rm = T)
[1] 1968

Range: diff(range())

diff(range(cens_gc$periode_desaparicio_1, na.rm = T))
[1] 39

Quantiles: quantile()

quantile(cens_gc$periode_desaparicio_1, 0.25, na.rm = T)
 25% 
1938 
quantile(cens_gc$periode_desaparicio_1, 0.75, na.rm = T)
 75% 
1938 

Interquartílic Range: IQR()

IQR(cens_gc$periode_desaparicio_1, na.rm = T)
[1] 0

Numeric variables

Dispersion: How dispersed are the numbers?

  • Standard deviation: How far from the mean? `sd()

Numeric variables

Dispersion: How dispersed are the numbers?

  • What is exactly the standard deviation? Follow the example:

A. We have a dataframe, with two numeric vectors.

devs <- tibble(vec1 = c(2, 6, 4, 8),
               vec2 = c(1, 9, 3, 7))

B. Both vectors have the same mean.

mean(devs$vec1)
[1] 5
mean(devs$vec2)
[1] 5

C. But how far each vector’s values are from the mean? Visually, we observe that values in vec2 are, on average, farther from the mean than values in vec1.

devs |> 
  pivot_longer(vec1:vec2) |> 
  ggplot(aes(x = name, y = value)) +
  geom_point(aes(col = name), size = 4) +
  stat_summary(geom = "point", size = 4) +
  theme_minimal()

D. The standard deviation calculates this distance. Although it does not exactly calculate the average. What it does is the following. Take the example of vec1:

  • First, we calculate the distance of each value from the mean.
diff <- devs$vec1 - mean(devs$vec1)
diff
[1] -3  1 -1  3
  • Then, we square the difference (so, the farther values weight more than the closer values).
sq_diff <- diff^2
sq_diff
[1] 9 1 1 9
  • We sum the values, and divide the result by the number of cases minus one. The result is called the variance.
variance <- sum(sq_diff) / (length(sq_diff) - 1)
variance
[1] 6.666667
  • The standard deviation is the squared root of the variance.
sqrt(variance)
[1] 2.581989

E. All in one:

devs |> 
  mutate(across(vec1:vec2, ~ . - mean(.))) |> # distance from the mean
  mutate(across(vec1:vec2, ~ .^2)) |> # square all values
  summarize(across(vec1:vec2, ~ sum(.) / (length(.) - 1))) |> #sum and divide number of cases
  mutate(across(vec1:vec2, ~ sqrt(.))) #square root
# A tibble: 1 × 2
   vec1  vec2
  <dbl> <dbl>
1  2.58  3.65

F. Or easier:

devs |> 
  summarize(across(vec1:vec2, ~ sd(.)))
# A tibble: 1 × 2
   vec1  vec2
  <dbl> <dbl>
1  2.58  3.65

G. What if, instead of the standard deviation, we calculate the average distance from the mean? First we calculate the absolute distance of each value.

abs_dist1 <- abs(devs$vec1 - mean(devs$vec1))
abs_dist2 <- abs(devs$vec2 - mean(devs$vec2))

And then we calculate the average.

mean(abs_dist1)
## [1] 2
mean(abs_dist2)
## [1] 3

Since the standard deviation first squares the values, the results tend to be higher than calculating the average distance.

Summarize and group_by functions

Summarize function

  • Summarizes data.
df |> 
  summarize(name = sum(vector))
  • Different elements can be summarized:
df |> 
  summarize(name1 = sum(vector_a),
            name2 = mean(vector_z),
            n = n())

*Mind the argument na.rm = T.

Group_by

  • Always combined with another function (e.g. summarize, filter, mutate), it groups the data by the values of a vector1
df |> 
  group_by(vector) |> 
  summarize(name1 = sum(vector))

*With group_by and summarize, we change the unit of observation of the dataset.

Recoding vectors

Recoding

When we recode variables (vectors), we lose information.

Destí Funció
Binària if_else()
Categòrica case_when()
Ordinal factor()
Qualsevol recode()
Altres as.numeric(), as.character(), as.Date(), etc.

Boolean operators

  • AND (&): TRUE if all conditions are met.
  • OR (|): TRUE if any condition is met.
  • NOT (!): TRUE if conditions are not met.

If_else

  • To a dichotomous / binary / dummy variable.
df |> 
  mutate(new_name = if_else(logic operation, true, false))

Case_when

case_when(logic operation ~ "C1"
          logic operation ~ "C2",
          logic operation ~ "C3",
          ...,
          TRUE ~ "CN")

Factor

df |> 
  mutate(new_vector = factor(wb$income_group, 
                             ordered = TRUE,
                             [levels o labels = ...]))

Recode

df |> 
  mutate(new_vector = recode(vector, 
                             old_value = "new_value"))

As functions

  • as.numeric(vector)
  • as.factor(vector)
  • as.character(vector)
  • as.integer(vector)
  • as.Date(vector)

Last thing

As functions

Dplyr cheatsheet